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A number of recent studies have estimated the inter-galactic void probability function and investigated its 
Q departure from various random models. We study a family of parametric statistical models based on gamma 

distributions, which do give realistic descriptions for other stochastic porous media. Gamma distributions 
contain as a special case the exponential distributions, which correspond to the 'random' void size probability 
arising from Poisson processes. The random case corresponds to the information-theoretic maximum entropy 
or maximum uncertainty model. Lower entropy models correspond on the one hand to more 'clustered' 
structures or 'more dispersed' structures than expected at random. The space of parameters is a surface 
pH with a natural Riemannian structure, the Fisher information metric. This surface contains the Poisson 

processes as an isometric embedding and provides the geometric setting for quantifying departures from 
randomness and perhaps on which may be written evolutionary dynamics for the void size distribution. 
Estimates are obtained for the two parameters of the void diameter distribution for an illustrative example 
C$ of data published by Fairall. 
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Introduction 

ON 

A number of studies over the past ten years have estimated the inter-galactic void probability function and 
investigated its departure from randomness. The basic random model is that arising from a Poisson process of 
mean density n galaxies per unit volume in a large box. Then, in a given region of volume V, the probability of 
finding exactly m galaxies is 



O P. - (i) 



So the probability that the given region is devoid of galaxies is Pg — e nV . Then it follows that the probability 
density function for the continuous random variable V in the Poisson case is 

Prand om {V) = ne~ nV (2) 



For comparison with observations, the approximation fails for very large V since a finite volume box is involved 
in any catalogue. 

A hierachy of A-point correlation functions needed to represent clustering of galaxies in a complete sense was 
devised by White |26| and he provided explicit formulae, including their continuous limit. In particular, he 
made a detailed study of the probability that a sphere of radius r is empty and showed that formally it is 
symmetrically dependent on the whole hierarchy of correlation functions. However, White concentrated his 
applications on the case when the underlying galaxy distribution was a Poisson process, the starting point for 
the present approach which is concerned with geometrizing the parameter space of departures from a Poisson 
process. 

*Presented at: Workshop on Statistics of Cosmological Data Sets NATO-ASI Isaac Newton Institute 8-13 August 1999. 
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Quantifying departures from randomness of the inter-galactic void probablity function 
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Figure 1: Probability density functions, f(V;(J,,{3), for gamma distributions of void volumes with unit mean 
/i = 1, and (3 = 0.5, 1, 2. The case (3 = 1 corresponds to a 'random' distribution from an underlying Poisson 
process. 



Geometry of gamma models for void volume statistics 

We choose a family of parametric statistical models that includes Q as a special case. There are of course 
many such families, but we take one that has been successful in modelling void size distributions in terrestrial 
stochastic porous media [8] and has been used in the representation of clustering of galaxies j6j . The family of 
gamma distributions has event space Q, = K + , parameters /i, (3 € K + and probability density functions given by 

Then V = fi and Var(V) — /J, 2 / (3 and we see that /i controls the mean of the distribution while the spread and 
shape is controlled by, 1/(3, the square of the coefficient of variation. 

The special case (3=1 corresponds to the situation when V represents the random or Poisson process in ^ 
with /i = 1/n. Thus, the family of gamma distributions can model a range of stochastic processes corresponding 
to non-independent 'clumped' events, for (3 < 1, and dispersed events, for (3 > 1, as well as the random case 
(cf. [5J[H]). Thus, if we think of this range of processes as corresponding to the possible distributions of centroids 
of extended objects such as galaxies that are initially distributed according to a Poisson process with {3=1, 
then the three possibilities are: 

Chaotic or random structure with no interactions among constituents, (3—1; 
Clustered structure arising from mutually attractive interactions, (3 < 1; 
Dispersed structure arising from mutually repulsive interactions, {3 > 1. 
Figure [l] shows a family of gamma distributions, all of unit mean, with (3 — 0.5, 1, 2. 

Shannon's information theoretic 'entropy' or 'uncertainty' for such stochastic processes (cf. eg. Jaynes [14]) is 
given, up to a factor, by the negative of the expectation of the logarithm of the probability density function |3]), 
that is 

/•oo 

S,(n,0) = - log(f(V;^,(3)f(V^,(3)dV (4) 
Jo 

= / 9+(i_/3)_ + ] g_ (5) 
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In particular, at unit mean, the maximum entropy (or maximum uncertainty) occurs at (3 = 1, which is the 
random case, and then Sf(fi, 1) = 1 + log \i. 

The 'maximum likelihood' estimates ft, f3 of fi, f3 can be expressed in terms of the mean and mean logarithm of a 
set of independent observations X — {X\, X2, . ■ . , X n }. These estimates are obtained in terms of the properties 
of X by maximizing the 'log-likelihood' function 

lx(fJ>,/3) = loglik x (n,(3) =log (^flpiX,; fi, 

with the following result 

1 " 

A = ^ = -E^ ( 6 ) 

»=i 

log/3-V(/3) = loiX-logX (7) 

where log A — ^ X)"=i 1°§ ^ anc ^ V*^) = T{pf 1S ^ nc digamma function, the logarithmic derivative of the 
gamma function 

The usual Riemannian information metric on the parameter space S = {(n, (3) € K + x K + } is given by 



dn 2 + U'{fi) - rf/3 



d4 = ~di/+\ - - ) d/3 2 for /i, f3 e R + . (8) 

For more details about the geometry see |18l |5J |7]. The 1-dimensional subspace parametrized by (3 = 1 
corresponds to the available 'random' processes. A path through the parameter space S of gamma models 
determines a curve 

c:[a,b}^S ( Cl (t),c 2 (t)) (9) 
with tangent vector c(t) — (di(t), c 2 (t)) and norm ||c|| given via ^ by 

ll^)ll 2 = ^^(*) 2 +(^( c 2(*))-^)) ^W 2 . (10) 

The information length of the curve is 

L c (a,b)= [ \\c(t)\\dt (11) 



and the curve corresponding to an underlying Poisson process has c(t) = (t, 1), so t — fi and (3 — 1 = constant, 
and the information length is log ^ . 

As we know from elementary geometry, arc length is often difficult to evaluate analytically because it contains 
the square root of the sum of squares of derivatives. Accordingly, we sometimes use the 'energy' of the curve 
instead of length for comparison between nearby curves. Energy is given by integrating the square of the norm 
of c 

E c (a,b)= [ \\c(t)\\ 2 dt. (12) 

J a 

so in the case of the curve c(t) = (t, 1), the energy is It is easily shown that a curve of constant n has 
c(i) = (constant, t) where t = j3 and c(t) = (0, 1); this has energy log | + ip'(b) — ip'(a). 

Locally, minimal paths joining nearby pairs of points in S are given by the autoparallel curves or geodesies [7] 
defined by pj. Some typical sprays of geodesies emanating from various points are provided in [B]. The Gaussian 
curvature of the surface S [7] actually controls all of the geometry of geodesies and it is given by 

K ^-^m^ for "-' JeR+ (13) 

K s (fi,0) -> -ias/3^0 (14) 
K s (n,0) -ias/3-.co (15) 
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Void diameter statistics 

For a general account of large-scale structures in the universe, see Fairall [9]. Kauffmann and Fairall [15] 
developed a catalogue search algorithm for nearly spherical regions devoid of bright galaxies and obtained a 
spectrum for diameters of significant voids. This had a peak in the range 8-11 h~ 1 Mpc, a long tail stretching 
at least to 64 h~ 1 Mpc, and is compatible with the recent extrapolation models of Baccigalupi et al [1 which 
yield an upper bound on void diameters of about 100 h~ 1 Mpc. We shall return to the data of Kauffmann and 
Fairall later in this section. 

Simulations of Sahni et al. [21 found strong correlation between void sizes and primordial gravitational potential 
at void centres; void topologies tended to simplify with time. Ghigna et al. [12 found in their simulations 
that void statistics are sensitive to the passage from CDM to CHDM models. This suggested that the void 
distribution is sensitive to the type of dark matter but not to the transfer function between types. CHDM 
simulations gave a void probability in excess of observations, CDM simulations being somewhat better. Vogeley 
et al. [22] compared void statistics with CDM simulations of a range of cosmological models; good agreement 
was achieved for samples of very bright galaxies (M < —19.2) but for samples containing fainter galaxies 
the predicted voids were reported to be 'too empty'. Ghigna et al. [TT] compared observational data with 
Gaussian-initiated iV-body simulations in a 50 h~ 1 Mpc box and found that at the 2 — 8 h~ 1 Mpc scales the void 
probability for CHDM was significantly larger than observed. Ghigna et al. [13] compared simulated galaxy 
samples with the Perseus-Pisces redshift survey. The void probability function did discriminate between DM 
and CDM models, the former giving particularly good agreement with the survey. 

Little and Weinberg |19j used similar TV-body simulations, and found that the void probability was insensitive to 
the shape of the initial power spectrum. Watson and Rowan- Robinson [23] found that standard CDM predictors 
do yield reasonably good void probability function estimates whereas Voronoi foam models performed less well. 

Lachieze-Rey and daCosta were of the opinion that the available samples of galaxies were insufficiently repre- 
sentative, because of greater apparent frequency of larger voids in the southern hemisphere. Bernardeau [2] 
started from a Gaussian field and derived an expression for the void probability function, obtaining 



with a 1 the variance of the number of galaxies in volume V. This distribution has an extended large tail because 
it is asymptotically more like the exponential of -(nF)' than the Poisson case which decays like the exponential 
of — nV. Cappi et al. [3] examined the dependence of the void probability function on scale for a range of galaxy 
cluster samples, finding a general scaling to occur up to void diameters of about 100 h~ 1 Mpc. Kerscher et al. [16] 
used the void probability function to obtain spatial statistics of clusters on scales 10 — 60 hr 1 Mpc, obtaining 
satisfactory agreement in a model with a cosmological constant and in a model with breaking of scale invariance 
of perturbations. 

For our model we consider the diameter D of a spherical void with volume V = ^ D 3, having distribution (|3j). 
Something close to the random variable D has direct representation in some theoretical models, for example as 
polyhedral diameters in Voronoi tesselations [221 [25J H] . 
The probability density function for D is given by 



Then the mean D, variance Var(D) and coefficient of variation CV(D) of D are given, respectively, by 



nV 



(nVa 2 y 3/r for large nV c 2 , 




(16) 



D 



r(/?+|) 



(17) 




Var(D) 



r(/3)r(/?+f)-r(/?+|) 2 



(18) 
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(19) 
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Figure 2: Probability density function Equation {16), for distributions of void diameters with unit mean, fi = 
1.244 and (3 — 0.370. These parameter values are the best fit for the data of Kauffmann and Fairall \15f . also 
used in Figure [?[ 
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Figure 3: Histograms of fractions of space occupied by different classes of void diameters, from Kauffmann 



and Fairall \15f (right hand columns) and predicted from the void diameter distribution Equation (16) (left 
hand columns) fitted to the same coefficient of variation and mean; the parameters found were /j, — 1.244 and 
[3 = 0.370. The class centres are in units of 200 km/s with a mean close to 7 units. 
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The fact that the coefficient of variation p9| depends only on (3 gives a rapid fitting of data to ( 16 1. Numerical 



fitting to (19 1 gives {3; this substituted in (jlTl) yields an estimate of /i to fit a given observational mean. By 
way of illustration, this has been done in Figure [| for the ZC AT/SRC data from Kauffmann and Fairall QDH] 
Figure 8a (cf also [2] Figure 6.5), both set to unit mean diameter — the true mean for that catalogue was about 
7 hr 1 Mpc. The fitted values were \i = 1.244 and (3 = 0.370. This fit is not particularly good if the reported 
peaks are not an artifact but, qualitatively, we observe that the fitted value (3 — 0.370 is apparently significantly 
less than 1, which would correspond to the random model. We conclude tentatively that, for the ZCAT/SRC 
data subjected to the Kauffmann and Fairall search algorithm, the new model suggests clumping rather than 
dispersion in the underlying stochastic process. A Mathematica NoteBook for performing the fitting procedure 
is available from the author, who would welcome more sets of data. 

Suppose that the parameter space S of gamma-based models is a meaningful representation of the evolutionary 
process and that some coordinates such as (fi = 1.244, (3 = 0.370) in S represent current data. Then geodesies 
in S through this point represent some kind of extremal path. Moreover, we may consider a vector field U on 
S such that (/x = 1.244, (3 = 0.370) is the present endpoint of an integral curve of U, the initial point of this 
curve being presumably in an epoch when less clustering {(3 > 0.370), a random state {(3 = 1) or even dispersion 
((3 > 1) was present at higher density. It would be interesting to investigate the various candidate cosmologies 
for their appropriate vector fields, via the statistics of matter and voids they predict. The present gamma- 
related parametric statistical models provide the means to convert the catalogue statistics into the coordinate 
parameters and a background geometrization of the statistics on which dynamical processes may be formulated. 
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